Comparing output results from singleR

In this notebook, we observe the annotation results obtained from SingleR; Those results were obtained in R in the R markdown named: SingleR_K.Rmd/SingleR_K.html

In this exercice we first predict labels for the PBMC3k datasets using:

Those datasets are available on Zenodo and provided with Besca.

PMBC3k prediction by singleR will be compared with the auto-annot prediction obtained for the manuscript.

PBMCK prediction

We load the dataset predicted with Auto-Annot.

In this pbm3ck data are stored multiples scores and information. Leiden is the leiden clustering on the whole datasets post classical filtering. dblabel is the results of the sig-annot procedure on the whole dataset also after reclustering around T-cells/NK-cells for a finer grain annotation. We rename dblabel as sig_annot for more clarity. Finally auto_annot is the results of the auto-annot procedure using Granja and Kotliarov datasets as training sets.

SingleR prediction

We load the prediction obtained with SingleR and upload the said results in our h5ad object.

SingleR predictions were obtained in R, please see SingleR_K.html / SingleR_K.Rmd

Labels to SingleR with Celldex packages

We need to map back the labels to SingleRs using the SingleR nomenclature (described in besca in CellTypes_v1.tsv) when using the singlecell datasets provided by SingleR (in the celldex package)

We do an exact mapping (if a subtype is not find, we go to large one). But we also do an exact mapping, ie. keeping as if celltype not supported now by besca

For datasets that were annotated with Besca (ie. Granja and Kotliarov), labels are consistents already, and comparisons is straight forward.

Comparing predictions

We fix the palette to keep the same colors for the UMAPs celltypes

Predictions reports generation

AutoAnnot vs Sig-annot

Monaco

Granja

Kotliarov

Granja + Kotliarov combined

Compare F1 Score

Besca report generate overall models. We retrieved the overall F1 and accuracy score for all reports generated in order to compare those values.

We can see that prediction with Monaco datasets is problematic; which was visible in the UMAP and the generated reports (overall most cells were predicted as monocytes). One note here is due to installation issue, normalization of the data was done using Seurat not by Scuttle which was not available.

Overall, Single R compare with Auto-annot prediction are close (F1 = 0.613); auto-annot report is slightly closer to sig-annot prediction on pmbc3k (training with Granja+Kotliarov). (AutoAnnot F1 = 0.67; SingleR F1 = 0.65).

Observing Cross-prediction

Granja predicted with Kotliarov Kotliarov predicted with Granja

END